R, RStudio and Tidyverse Stack

The R Language

R is a scripting language and a very powerful tool for data analysis and presentation, primarily due to the huge user base and their dedication to developing free and open source libraries/packages covering a vast range of different knowledge domains:

The Comprehensive R Archive Network (CRAN) is the canonical repository for R packages, note that almost all* packages hosted on CRAN may be used within a Shiny app.

*Packages dependent on parallel or distributed computing are unlikely to be supported, contact shinyapps-support@rstudio.com with any questions

Learning R

There are thousands of online resources for learning R, many are available for free.

Two I’d like to personally endorse are:

The R Console

R is the name of the programming language and console within which many users of R write and evaluate their code.

To use R on your local machine you must download and install the R Console, it’s available on Windows, OS X and Linux.

Like all consoles, this application provides [only] the following functionality:

  • Write code and script files
  • Evaluate code and script files

RStudio/

RStudio is a free, open-source IDE (integrated development environment) that provides an extremely powerful and friendly interface for developing with R.

IDEs make it easier to manage your programming, providing the following features:

RStudio/

RStudio, however, provides much more exciting features on top of a standard IDE:

*more on RMarkdown after some actual code.

RStudio/

Base R and R Packages

When R is installed on your computer the machinery necessary to run R code is added to your computer and a number of “base” packages including; stats, utils and graphics.

See stackoverflow.com/a/9705725/1659890 for further details.

These packages will not get you far in life, unless you’re prepared to write a lot of code from scratch.

But you can guarantee* that any code samples you see online referring to “base R” only will work without having to install additional libraries.

Installing Packages

If a package is on CRAN then it is “installed” onto your using the following code, you’re advised to write this directly into the console and not into your documents

install.packages("ggplot2")

Once a library is installed, functions can be accessed using ggplot::geom_point().

However, libraries are designed to be used after being loaded:

library(ggplot2)

Warning on Packages

While packages are incredibly useful, it is important not to offload all thought/development to packages for three important reasons:

RStudio-backed Packages

RStudio/
  • The “tidyverse” is a collection of packages maintained by RStudio devs [particularly Hadley Wickham]
  • tidyverse packages play extremely nicely together
  • tidyverse packages are extremely useful for preparing data for interactive visualisations
  • tidyverse packages are highly optimised, often specifically around nitpicky details of bse R (readr is a good example of this)
  • tidyverse is the backbone of the recently published, free online book R for Data Science

Tidyverse package workflow

RStudio/
  • Import with readr
  • Reshape with tidyr
  • Filter, modify and query with dplyr
  • Visualise with ggplot (but that’s not interactive…)

RStudio has links to a fantastic cheatsheet on the tidyverse (the complicated reshaping/filtering part of it) available under Help > Cheatsheets

Installing the tidyverse

There are currently over 15 packages in the tidyverse, it’s a pain installing each of them separarely. So RStudio have made everything easy to manage via the tidyverse package:

# install.packages("tidyverse")
library("tidyverse")
## ── Attaching packages ─────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 2.2.1.9000     ✔ purrr   0.2.4     
## ✔ tibble  1.4.2          ✔ dplyr   0.7.4     
## ✔ tidyr   0.8.0          ✔ stringr 1.3.0     
## ✔ readr   1.1.1          ✔ forcats 0.3.0
## ── Conflicts ────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
tidyverse_packages()
##  [1] "broom"       "cli"         "crayon"      "dplyr"       "dbplyr"     
##  [6] "forcats"     "ggplot2"     "haven"       "hms"         "httr"       
## [11] "jsonlite"    "lubridate"   "magrittr"    "modelr"      "purrr"      
## [16] "readr"       "readxl\n(>=" "reprex"      "rlang"       "rstudioapi" 
## [21] "rvest"       "stringr"     "tibble"      "tidyr"       "xml2"       
## [26] "tidyverse"

R Syntax Catch-up

We’re going to be using what to some users is considered advanced R programming during today’s session, but often experienced R users get tripped up over brackets. It’s good to cement into your head what each bracket is for so that when you read code you know what’s going on:

Encapsulate the arguments for a function, in the case of rep("Hello World", 2) the round brackets encapsulate the two arguments passed to the function rep - arguments are therefore deliminated by commas.

Used for extracting parts (rows, columns, individual elements) from data structures - that’s there only use

Used for containing expressions - when writing mathematical expressions by hand round brackets are usually used for controlling precedence (order of operations), but in R you should write 2*{x+1}^2.

Braces are necessary where more than one thing is being done in an individual argument

rep(
  "strings",
  {
    no1 <- 2
    no1 +3
  }
)
## [1] "strings" "strings" "strings" "strings" "strings"